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ABSTRACT 

In this paper an expert system for diagnosis and recovery of failures in the freon cooling loop 
of the European retrievable experiment carrier EURECA is described. 

This system demonstrates the feasibility of a functional scope of expert diagnostic systems 
which appears to be essential for practical applications of such systems in space technology. 
This scope comprises : early warning and treatment of incomplete information, fault tolerance, 
identification of failure superpositions (particularly involving failed sensors), intelligent reaction 
to unforeseen events and detailed status display for optimal recovery action. 

1. INTRODUCTION 

Particularly in view of the implementation of expert systems for failure diagnosis and recovery 
on autonomous spacecraft, but also with respect to their application as ground-based consultant 
systems, a certain enhanced functionality of such systems appears to be essential, covering in 
particular early warning and treatment of incomplete information, failure superposition, intelli- 
gent reaction to unforeseen events, tolerance to isolated faults in the knowledge base and 
detailed status display for optimal recovery action. 

In this paper an expert system is described which, in its first development phase, has been 
used to implement and assess the technology required for the realization of these requirements 
for the diagnostic process. As such it could initially be used as a ground-based operator's consulting 
system, serving at the same time as test-bed for a further refinement of this technology for 
subsequent on-board applications. 

2. THE COOLING LOOP OF EURECA 

As system to be monitored the cooling loop of EURECA was chosen with the aim of a later 
expansion of the knowledge base to also include the thermal control unit, thus providing the 
complete TCS as application domain in the final stage. 

The cooling loop of EURECA is depicted in Figure 2 showing the pump package (FPP) con- 
sisting of two redundant pumps of which one drives the cooling medium, freon, through the 
experiment line (upper branch) where the freon cools the experiments, then through the two 
radiators (Rad +x and Rad -x) where the heat taken up by the freon is dissipated and finally 
through the equipment line (lower branch) where equipment such as batteries and power distri- 
bution units are kept approximately at room temperature. 

The fine tuning of the freon temperature is achieved by small adaptive heaters positioned 
along the experiment and equipment line as well as on the radiators. 

The sensors are shown as icons in Figure 2 and measure pump inlet pressure (Pin), pump 
outlet pressure (Pout), pump inlet temperature (Tj n ), radiator inlet temperature (T Sj ) i radiator 
outlet temperature (T S[ ), freon quantity in the accumulator (Acc. Q.) and the electrical pump 
current (l p ). Moreover, 'e delta pressure switch (dP) gives a signal if the pump pressure head has 



broken down. (It should be mentioned that the question whether all these sensors will be 
available for EURECA is still under discussion. However, as the initial development stage of the 
expert system only aims at a technology demonstration, the exact number and type of sensors 
used is not crucial). 

In addition to these direct measurements, the total power consumed by the adaptive heaters 
for the experiment line (Pwr Ex), the radiators (Pwr +x and -x) and the equipment (Pwr Eq) is 
also used, since changes in this power can serve as indirect indications of changes in flow, thus 
substituting to a certain extent the fact that a direct flow measurement will not be available for 
the EURECA cooling loop. 


3. KNOWLEDGE REPRESENTATION 
EUREX D has a rule-based knowledge representation which is characterized by : 
o Global monitoring : 

Diagnosis is not only based on single sensor monitoring but on a global assessment of the 
concurrent readings of all sensors, identifying characteristic data patterns and relations (such 
as temperature gradients along the different sections of the cooling loop) thus providing a 
broad basis for the diagnostic process. 

o Indirect monitoring : 

Apart from a direct interpretation of sensor signals such as interpreting the activation of the 
delta pressure switch as pump performance degradation, extensive use is made of indirect 
monitoring such as taking a flow reduction as additional evidence for the pump performance 
degradation, or using temperature readings for a measurement of flow as indicated above, etc. 

o Redundant monitoring : 

In the reasoning process ALL known evidence for a given anomaly is taken into account thus 
allowing for incomplete or even partially erroneous information or knowledge to be processed 
without grave consequences since the system's dependency on individual bits of information 
or knowledge is greatly reduced. 

o Multi-valued logic : 

Taking all known evidence into account implies the use of non-conclusive evidence : 

For instance, the flow reduction mentioned above as being indicative of a pump performance 
degradation could also indicate a flow blockage or even a superposition of both anomolies. 
Therefore rules generally hold only partially, which is represented by implication strengths a 
taking values between 0 and 1, assigned to the rules. 

o Causal connectivity : 

Although EUREX D does not reason "from first principles" but relies on knowlege based on 
engineering experience and heuristics, it is able to identify causal connections between states 
(e.g. a flow blockage being the cause of a flow reduction) for greater transparency and com- 
pleteness in the description of the system's state. The knowledge necessary to identify these 
causal connections is provided by pointers assigned to the states. 

4. INFERENCE MECHANISM 

The various steps in the knowledge-based data processing, which are also reflected in the 
system’s architecture shown in Figure 1, are given by : 

o Data processing : 

i.e. the computation of temperature gradients along the various sections of the cooling loop, 
pressure differences etc. 
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o Data classification : 

i.e. local classification of the data in relation to the nominal interval generating "observations 
such as "temperature high", "pressure normal" etc. 

o Sensor state assessment : . 

Dedicated rules check the plausibility of sensor readings on the basis of these observations 
and assign belief values to the sensors : 1 for normal operation, 0 for degraded sensors. 

o System state assessment : .... 

For each system state, all rules pointing to this state are "tickled" and the implication streng- 
ths of fired rules are collected by an accumulation function leading to an integrated certainty 
factor. In the case of several inference steps (where system states serve as evidence for other 
system states) composite implication strengths are computed by propagation functions. In 
particular sensor belief values of 0 simply cause any inference based on this sensor to drop 
out of the reasoning process. 

o State evaluation : 

States with certainty factor 1 are displayed as diagnoses. In case such cannot be found, 
several states with the highest certainty factors are presented as possible but not conclusive 
diagnosis. Diagnosed states are displayed in columns, the states given in a certain column 
always being the cause of the states in the adjacent columns to the left, thus generating a 
detailed status display which is not just based on the primary cause of an anomaly. 

o Recovery actions : 

Depending on the diagnosis, appropriate recovery actions are selected. When the diagnosis 
is not conclusive, i.e. when it consists of several states with certainty factors less than one, 
the suggestions for recovery actions obviously also cannot be conclusive but are qualified by 
priority factors which are functions of the certainty factors and possible additional informa- 
tion (such as the rule to always react to the most hazardous situation first, even if it has a 
comparatively low certainty factor). However, this identification of recovery actions has not 
yet been implemented in EUREX D. 

5. IMPLEMENTATION 

EUREX D is being developed on the LISP-based development environment KEE and runs 
on SYMBOLICS machines. 

Sensor-dedicated demons are responsible for the data classification described above. States 
are objects (units in KEE terminology) automatically collecting the evidence pointing to the 
states and computing the integrated certainty factors. 

Rules are grouped into classes corresponding to their firing priority in the inference process. 
Based on KEE's facilities the man-machine interface is strongly supported graphically, facilita- 
ting knowledge acquisition and explanation of the reasoning processes. 

In particular, the sensor readings are shown as bar graphs and their classification as shaded/ 
non-shaded areas, responding actively to the input data. Conversely, the bar graphs can be 
mouse-manipulated to preset the sensor readings for an initial nominal and a final anomal 
state for an in-built test simulator. This simulator generates, at fixed time intervals At, the 
sensor readings of the intermediate states which develop as the anomaly evolves from the 
initial nominal state to the preset end-state. 

At eachAt EUREX D then performs its diagnosis on the sensor readings of these intermediate 
states. 
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6. FUNCTIONAL SCOPE 

EUREX D displays the following functional scope corresponding to the requirements listed 
in the introduction : 

o Early warning and treatment of incomplete information : 

Due to this fact that the reasoning mechanism is based on multi-valued logic and can process 
non-conclusive evidence, the system does not depend on the sensor data patterns character- 
istic of fully evolved anomalies for its diagnosis, but is able to process first symptoms of 
developing failures (i.e. incomplete information), presenting assumptions of several possible 
failures (weighted with certainty factors less than 1) for early warning and preventive action. 
An example of an assessment of the evolving symptoms of a flow blockage is given in Figures 
2-3. A similar treatment applies when the incompleteness of information is due to other 
reasons, such as reduced sensor availability etc. 

o Failure superposition : 

Obviously a premeditation of all failure superpositons for a diagnostic system is impossible, 
and, like most other diagnostic systems, EUREX D is designed for the diagnosis of single 
failures only. However, it does display the ability to treat failure superpositions to a fair 
extent : 

Degraded sensors are detected by dedicated sensor rules and taken out of the diagnostic 
process as described in Chapter 4. The diagnosis of simultaneously occuring system anomalies 
then proceeds on the basis of incomplete information as shown above. Thus concurrent 
sensor failures and system anomalies can be discerned. 

Concurrent system anomalies can obviously be detected if they do not have opposing effects 
on the same sensors. Otherwise the system will again offer several assumptions with certainty 
factors less than 1 . For example the superposition of a flow blockage and a leak leads to the 
assumption of pump performance degradation (c.f. = 0.5), leakage (c.f. = 0.7) and flow 
blockage (c.f. = 0.5). 

o Treatment of unforeseen events : 

Processing of non-conclusive evidence in EUREX D also facilitates intelligent reaction to non- 
premeditated events. On the basis of the subset of recognized features, known situations are 
enumerated which have the greatest similarity to the unforeseen event. 

At the same time, the fact that these are just assumptions is signalled by certainty factors less 
than 1. 

o Tolerance to isolated faults in the knowledge-base : 

Regardless of whether such faults are due to erroneous coding or some irradiation of com- 
puter memory in case of on-board expert systems, it is imperative that an expert system does 
not react catastrophically in case of such error. 

Due to the excessive knowledge redundancy (see chapter 3) this tolerance is indeed given to a 
great extent in EUREX D, where most isolated faults are "drowned" in the "majority vote" 
of the remaining evidence. 

o Detailed status display : 

The inclusion of states causally connected to primary failure states for greater detail of status 
display has already been described in Chapters 3 and 4. 

7. CONCLUSIONS 

An expert system for the failure diagnosis of the cooling loop of EURECA has been described 
which displays a couple of features which appear to be essential for practical applications of 
such expert systems in space technology, the main aspects of the underlying methodology 
being given by knowledge redundancy and multi-valued logic. 

It should be noted, however, that the management of uncertainty involved still poses some 
problems in the case of very large knowlege-bases and future work will have to concentrate on 
this aspect. 
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Figure 1 : Components of EUREX D 


EUkECH - ICS 



Diagnostics UmdoM 


[r LOW. REDUCTION 


Eg 


|PUMP.PEKFOKMANCE,DCG*AOATION j P U M P ,BE A K INC , P EG K AOATION (»Tjj 


! FLOW. BLOCK ACE (I.S)| 


fCHECK.VALVE.LEAKACE.OF.STANDBY.PUMP («.4> 


NQS, ^^-SEH-lSdr* 


Setup 


sup 


Monitor Limits 


Explain I 


SfiMt Filler*: NONE 


09 ' 22 /flJ 14 : 5 }: 3 ? LlSP-nriCHlhL 


Figure 2 : Display of cooling loop and diagnostic window showing an assessment of 
first symptoms of a flow blockage 
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Figure 3 : Diagnosis of fully evolved flow blockage 
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